Privacy Preserving Private Frequent Itemset Mining via Smart Splitting
نویسندگان
چکیده
Recently there has been a growing interest in designing differentially private data mining algorithms. A variety of algorithms have been proposed for mining frequent itemsets. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. It has practical importance in a wide range of application areas such as decision support, web usage mining, bioinformatics, etc. In this paper, It explore the possibility of designing a differentially private FIM algorithm which can not only achieve high data utility and a high degree of privacy, but also offer high time efficiency. To this end, a differentially private FIM algorithm based on the FP-growth algorithm, which is referred to as PFP-growth. The PFP-growth consist of a preprocessing phase and a mining phase. In the preprocessing phase, to improve the utility and privacy tradeoff, a novel smart splitting method is proposed to transform the database. For a given database, the preprocessing phase needs to be performed only once. In the mining phase, to offset the information loss caused by transaction splitting,we devise a run-time estimation method to estimate the actual support of itemsets in the original database. In addition, by leveraging the downward closure property, we put forward a dynamic reduction method to dynamically reduce the amount of noise added to guarantee privacy during the mining process. Through formal privacy analysis, PFP-growth algorithm is differentially private. Extensive experiments on real datasets illustrate that in PFP-growth algorithm substantially outperforms the state-of-the-art technique.
منابع مشابه
A Study of Differentially Private Frequent Itemset Mining
Frequent sets play an important role in many Data Mining tasks that try to search interesting patterns from databases, such as association rules, sequences, correlations, episodes, classifiers and clusters. FrequentItemsets Mining (FIM) is the most well-known techniques to extract knowledge from dataset. In this paper differential privacy aims to get means to increase the accuracy of queries fr...
متن کاملPersonalized Privacy-Preserving Frequent Itemset Mining Using Randomized Response
Frequent itemset mining is the important first step of association rule mining, which discovers interesting patterns from the massive data. There are increasing concerns about the privacy problem in the frequent itemset mining. Some works have been proposed to handle this kind of problem. In this paper, we introduce a personalized privacy problem, in which different attributes may need differen...
متن کاملA Survey on Privacy Preserving Association Rule Mining of Outsourced Databases
Data mining finds useful patterns from the large dataset. Data analysis techniques that are frequent itemset mining and association rule mining are two popular and broadly utilized for different applications. Personal or sensitive information of individuals, industries or organizations must be kept private before it is shared for the data mining. Hence privacy preserving data mining has become ...
متن کاملPrivacy-Preserving Frequent Itemset Mining for Sparse and Dense Data
Frequent itemset mining is a task that can in turn be used for other purposes such as associative rule mining. One problem is that the data may be sensitive, and its owner may refuse to give it for analysis in plaintext. There exist many privacy-preserving solutions for frequent itemset mining, but in any case enhancing the privacy inevitably spoils the efficiency. Leaking some less sensitive i...
متن کاملCS 730R: Topics in Data and Information Management
1. Summary. In this paper the authors propose a differentially privacy preserving algorithm for mining frequent itemset. This work differs from the other privacy preserving miners present in literature, indeed this algorithm mines the itemset by enforcing cardinality constraints on the transactions present in the dataset. In particular the authors study how the reduction the cardinality of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016